Search CORE

19 research outputs found

An overview & analysis of sequence-to-sequence emotional voice conversion

Author: Aslan Ilhan
Jing Xin
Schuller Björn W.
Song Meishu
Triantafyllopoulos Andreas
Yang Zijiang
Publication venue: 'International Speech Communication Association'
Publication date: 01/01/2022
Field of study

Emotional voice conversion (EVC) focuses on converting a speech utterance from a source to a target emotion; it can thus be a key enabling technology for human-computer interaction applications and beyond. However, EVC remains an unsolved research problem with several challenges. In particular, as speech rate and rhythm are two key factors of emotional conversion, models have to generate output sequences of differing length. Sequence-to-sequence modelling is recently emerging as a competitive paradigm for models that can overcome those challenges. In an attempt to stimulate further research in this promising new direction, recent sequence-to-sequence EVC papers were systematically investigated and reviewed from six perspectives: their motivation, training strategies, model architectures, datasets, model inputs, and evaluation methods. This information is organised to provide the research community with an easily digestible overview of the current state-of-the-art. Finally, we discuss existing challenges of sequence-to-sequence EVC

arXiv.org e-Print Archive

OPUS Augsburg

Identifying languages in a novel dataset: ASMR-whispered speech

Author: Jing Xin
Parada-Cabaleiro Emilia
Schuller Björn
Song Meishu
Yamamoto Yoshiharu
Yang Zijiang
Publication venue
Publication date: 01/01/2023
Field of study

Introduction: The Autonomous Sensory Meridian Response (ASMR) is a combination of sensory phenomena involving electrostatic-like tingling sensations, which emerge in response to certain stimuli. Despite the overwhelming popularity of ASMR in the social media, no open source databases on ASMR related stimuli are yet available, which makes this phenomenon mostly inaccessible to the research community; thus, almost completely unexplored. In this regard, we present the ASMR Whispered-Speech (ASMR-WS) database. Methods: ASWR-WS is a novel database on whispered speech, specifically tailored to promote the development of ASMR-like unvoiced Language Identification (unvoiced-LID) systems. The ASMR-WS database encompasses 38 videos-for a total duration of 10 h and 36 min-and includes seven target languages (Chinese, English, French, Italian, Japanese, Korean, and Spanish). Along with the database, we present baseline results for unvoiced-LID on the ASMR-WS database. Results: Our best results on the seven-class problem, based on segments of 2s length, and on a CNN classifier and MFCC acoustic features, achieved 85.74% of unweighted average recall and 90.83% of accuracy. Discussion: For future work, we would like to focus more deeply on the duration of speech samples, as we see varied results with the combinations applied herein. To enable further research in this area, the ASMR-WS database, as well as the partitioning considered in the presented baseline, is made accessible to the research community

OPUS Augsburg

Coughing-based recognition of Covid-19 with spatial attentive ConvLSTM recurrent neural networks

Author: Liu Shuo
Meng Hao
Parada-Cabaleiro Emilia
Schuller Björn W.
Song Meishu
Yan Tianhao
Publication venue: 'International Speech Communication Association'
Publication date: 01/01/2021
Field of study

OPUS Augsburg

Supervised contrastive learning for game-play frustration detection from speech

Author: Baird Alice
Liu Shuo
Milling Manuel
Parada-Cabaleiro Emilia
Schuller Björn W.
Song Meishu
Yang Zijiang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

OPUS Augsburg

Frustration recognition from speech during game interaction using wide residual networks

Author: Liu Shuo
Mallol-Ragolta Adria
Parada-Cabaleiro Emilia
Ren Zhao
Schuller Björn W.
Song Meishu
Yang Zijiang
Zhao Ziping
Publication venue
Publication date: 28/08/2020
Field of study

ABSTRACT Background Although frustration is a common emotional reaction during playing games, an excessive level of frustration can harm users’ experiences, discouraging them from undertaking further game interactions. The automatic detection of players’ frustration enables the development of adaptive systems, which through a real-time difficulty adjustment, would adapt the game to the user’s specific needs; thus, maximising players experience and guaranteeing the game success. To this end, we present our speech-based approach for the automatic detection of frustration during game interactions, a specific task still under-explored in research. Method The experiments were performed on the Multimodal Game Frustration Database (MGFD), an audiovisual dataset—collected within the Wizard-of-Oz framework—specially tailored to investigate verbal and facial expressions of frustration during game interactions. We explored the performance of a variety of acoustic feature sets, including Mel-Spectrograms and Mel-Frequency Cepstral Coefficients (MFCCs), as well as the low dimensional knowledge-based acoustic feature set eGeMAPS. Due to the always increasing improvements achieved by the use of Convolutional Neural Networks (CNNs) in speech recognition tasks, unlike the MGFD baseline—based on Long Short-Term Memory (LSTM) architecture and Support Vector Machine (SVM) classifier—in the present work we take into consideration typically used CNNs, including ResNets, VGG, and AlexNet. Furthermore, given the still open debate on the shallow vs deep networks suitability, we also examine the performance of two of the latest deep CNNs, i. e., WideResNets and EfficientNet. Results Our best result, achieved with WideResNets and Mel-Spectrogram features, increases the system performance from 58.8 % Unweighted Average Recall (UAR) to 93.1 % UAR for speech-based automatic frustration recognition

OPUS Augsburg

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Predicting group work performance from physical handwriting features in a smart English classroom

Author: Chen Bin
Hidaka Ichiro
Liu Shuo
Okabayashi Keiju
Parada-Cabaleiro Emilia
Qian Kun
Schuller Björn
Song Meishu
Togami Kazumasa
Wang Yueheng
Yamamoto Yoshiharu
Yang Zijiang
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2021
Field of study

OPUS Augsburg

An Early Study on Intelligent Analysis of Speech under COVID-19: Severity, Sleep Quality, Fatigue, and Anxiety

Author: Han Jing
Ji Wei
Koike Tomoya
Li Xiao
Liu Juan
Liu Shuo
Qian Kun
Ren Zhao
Schuller Björn W.
Song Meishu
Yamamoto Yoshiharu
Yang Zijiang
Zhang Zixing
Zheng Huaiyuan
Publication venue
Publication date: 01/01/2020
Field of study

The COVID-19 outbreak was announced as a global pandemic by the World Health Organisation in March 2020 and has affected a growing number of people in the past few weeks. In this context, advanced artificial intelligence techniques are brought to the fore in responding to fight against and reduce the impact of this global health crisis. In this study, we focus on developing some potential use-cases of intelligent speech analysis for COVID-19 diagnosed patients. In particular, by analysing speech recordings from these patients, we construct audio-only-based models to automatically categorise the health state of patients from four aspects, including the severity of illness, sleep quality, fatigue, and anxiety. For this purpose, two established acoustic feature sets and support vector machines are utilised. Our experiments show that an average accuracy of .69 obtained estimating the severity of illness, which is derived from the number of days in hospitalisation. We hope that this study can foster an extremely fast, low-cost, and convenient way to automatically detect the COVID-19 disease

arXiv.org e-Print Archive

OPUS Augsburg

Crossref